首页> 外文OA文献 >Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature
【2h】

Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature

机译:使用具有混合特征的随机森林模型从氨基酸序列预测蛋白质中的DNA结合残基

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Motivation: In this work, we aim to develop a computational approach for predicting DNA-binding sites in proteins from amino acid sequences. To avoid overfitting with this method, all available DNA-binding proteins from the Protein Data Bank (PDB) are used to construct the models. The random forest (RF) algorithm is used because it is fast and has robust performance for different parameter values. A novel hybrid feature is presented which incorporates evolutionary information of the amino acid sequence, secondary structure (SS) information and orthogonal binary vector (OBV) information which reflects the characteristics of 20 kinds of amino acids for two physical–chemical properties (dipoles and volumes of the side chains). The numbers of binding and non-binding residues in proteins are highly unbalanced, so a novel scheme is proposed to deal with the problem of imbalanced datasets by downsizing the majority class.
机译:动机:在这项工作中,我们旨在开发一种计算方法,用于预测氨基酸序列中蛋白质的DNA结合位点。为避免此方法过度拟合,使用了来自蛋白质数据库(PDB)的所有可用DNA结合蛋白来构建模型。之所以使用随机森林(RF)算法,是因为它速度快并且对于不同的参数值具有鲁棒的性能。提出了一种新颖的杂交特征,该特征融合了氨基酸序列的进化信息,二级结构(SS)信息和正交二元载体(OBV)信息,该信息反映了两种物理化学性质(偶极子和体积)的20种氨基酸的特征的侧链)。蛋白质中结合残基和非结合残基的数量高度不平衡,因此提出了一种新颖的方案,通过缩减多数类来解决数据集不平衡的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号